Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity

نویسندگان

  • Barbara Rychalska
  • Katarzyna Pakulska
  • Krystyna Chodorowska
  • Wojciech Walczak
  • Piotr Andruszkiewicz
چکیده

This paper describes our proposed solutions designed for a STS core track within the SemEval 2016 English Semantic Textual Similarity (STS) task. Our method of similarity detection combines recursive autoencoders with a WordNet award-penalty system that accounts for semantic relatedness, and an SVM classifier, which produces the final score from similarity matrices. This solution is further supported by an ensemble classifier, combining an aligner with a bi-directional Gated Recurrent Neural Network and additional features, which then performs Linear Support Vector Regression to determine another set of scores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IRIT: Textual Similarity Combining Conceptual Similarity with an N-Gram Comparison Method

This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic Textual Similarity). The method used consists of a n-gram based comparison method combined with a conceptual similarity measure that uses WordNet to calculate the similarity between a pair of concepts.

متن کامل

HHU at SemEval-2016 Task 1: Multiple Approaches to Measuring Semantic Textual Similarity

This paper describes our participation in the SemEval-2016 Task 1: Semantic Textual Similarity (STS). We developed three methods for the English subtask (STS Core). The first method is unsupervised and uses WordNet and word2vec to measure a token-based overlap. In our second approach, we train a neural network on two features. The third method uses word2vec and LDA with regression splines.

متن کامل

LIPN-CORE: Semantic Text Similarity using n-grams, WordNet, Syntactic Analysis, ESA and Information Retrieval based Features

This paper describes the system used by the LIPN team in the Semantic Textual Similarity task at SemEval 2013. It uses a support vector regression model, combining different text similarity measures that constitute the features. These measures include simple distances like Levenshtein edit distance, cosine, Named Entities overlap and more complex distances like Explicit Semantic Analysis, WordN...

متن کامل

AMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders

We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic s...

متن کامل

VRep at SemEval-2016 Task 1 and Task 2: A System for Interpretable Semantic Similarity

VRep is a system designed for SemEval 2016 Task 1 Semantic Textual Similarity (STS) and Task 2 Interpretable Semantic Textual Similarity (iSTS). STS quantifies the semantic equivalence between two snippets of text, and iSTS provides a reason why those snippets of text are similar. VRep makes extensive use of WordNet for both STS, where the Vector relatedness measure is used, and for iSTS, where...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016